Automated Syllabus of Machine Vision Papers

Built by Rex W. Douglass @RexDouglass ; Github ; LinkedIn

Papers curated by hand, summaries and taxonomy written by LLMs.

Submit paper to add for review

Medical Image Processing and Deep Learning

> Deep Learning Techniques for Enhanced Medical Image Segmentation

>> Low-Rank Fine-Tuning Strategies for Large-Scale Models

Consider using low-rank-based fine-tuning strategies like LoRA to efficiently adapt large-scale models like SAM for specialized domains such as medical image segmentation, without compromising performance. (Zhang and Liu 2023)

> Deep Learning Enhancements for Biomedical Data Classification

>> Area Under ROC Curve Optimization for ANN Training

Consider using the area under the receiver operating characteristic (ROC) curve (Az) as an error measure during the training process of artificial neural networks (ANNs) for biomedical data classification tasks, as it can lead to improvements in Az compared to traditional error measures like root mean square error (RMS). (“Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications” 2010)

>> Deep Learning Applications for Cancer Diagnosis

Consider creating your own annotated dataset when existing ones are insufficient, and use deep learning techniques like convolutional neural networks (CNNs) to improve the accuracy and efficiency of estimating prognostic factors such as Ki-67 and Tumor Infiltrating Lymphocytes (TILs) in breast cancer. (Negahbani et al. 2021)

> Enhancing medical image analysis using AI techniques

>> Combining PSO & Two-Way Fixed Effects ANOVA for MRI Lesion Detection

Consider combining Particle Swarm Optimization (PSO) with two-way fixed-effects Analysis of Variance (ANOVA) as a fitness function to efficiently and accurately detect brain lesions in magnetic resonance imaging (MRI) data. (Atia et al. 2022)

> Radiomics Feature Selection and Classification Techniques

>> Open Access Datasets for Reproducible AI Studies

Consider publishing detailed descriptions of your open-access datasets as Medical Physics Dataset Articles (MPDAs) to promote reproducibility, generalizability, and future clinical translation of studies using quantitative data analysis or artificial intelligence/machine learning approaches. (NA?)

> Cardiac Segmentation Techniques for Clinical Assessment

>> Datasets for Detecting Abnormal Ventricular Potentials Post Ischemia

Prioritize creating and sharing large, diverse, and carefully annotated datasets to facilitate the development and evaluation of algorithms for detecting and characterizing abnormal ventricular potentials in post-ischemic ventricular tachycardia. (Koplan and Stevenson 2009)

>> Combinatorial Approaches for Enhanced Cardiac MR Segmentation

Consider combining multiple energy functionals, such as signed localized Yezzi and localized Chan-Vese, to accurately segment the endocardial and epicardial contours in cardiac magnetic resonance imaging, while also incorporating explicit controls over the equilibrium point between the inner and outer regions of the interfaces to better match the physicians knowledge when manually contouring. (NA?)

> Ensemble & Self-Supervised Learning for Retinal Diagnosis

>> Masked Autoencoder Foundation Models for Retinal Images

Consider using self-supervised learning (SSL) techniques, particularly masked autoencoders, to develop foundation models for retinal images, as they offer improved performance and label efficiency compared to traditional supervised learning methods and other SSL approaches. (NA?)

Scene Text Recognition Research Design

> Optimizing Text Detection & Recognition Techniques

>> Advanced Approaches for Improving Scene Text Recognition

Consider incorporating higher order language models, specifically n-grams derived from a large English dictionary, into a CRF framework to improve word recognition in scene text images, even for out-of-vocabulary words. (Mishra, Alahari, and Jawahar 2012)

>> Benchmarking & Enhancing Performance through Evaluation Metrics

Utilize standardized evaluation metrics across similar tasks to enable comparability and benchmarking, as demonstrated by the use of common evaluation protocols for text localization in Challenges 1 and 2 of the ICDAR 2013 Robust Reading Competition. (NA?)

>> Unified Multimodal Modeling for Enhanced Document Understanding

Consider unifying multiple modalities (such as text, image, and layout) in a single model, rather than treating them separately, to better capture the complex relationships present in document data. (Xingyu Chen et al. 2021)

> Optimization Strategies for Scene Text Recognition Models

>> Standardizing Datasets for Fair Model Comparison

Ensure consistency in your choice of training and evaluation datasets to enable fair and accurate comparisons of scene text recognition models. (Zeiler 2012)

>> Pre-Trained Transformer Models for Efficient Text Recognition

Consider utilizing pre-trained image and text transformer models for text recognition tasks, as demonstrated by the proposed TrOCR model, which achieves state-of-the-art results without requiring convolutional networks or complex pre/post-processing steps. (M. Li et al. 2021)

Visual Perception Techniques for Robotics & Imaging

> Probabilistic Models & Multimodal Approaches for Image Processing

>> Bayesian Framework for Solving Matting Problem

Consider adopting a Bayesian framework for solving the matting problem, which involves modeling both the foreground and background color distributions with spatially-varying sets of Gaussians and using a maximum-likelihood criterion to estimate the optimal opacity, foreground and background simultaneously. (Chuang et al., n.d.)

>> Probabilistic Colour Estimation & Shadow Removal

Carefully evaluate the performance of your methods using high-quality, accurately labeled datasets, as demonstrated by the authors finding that Bayesian illuminant estimation significantly improves with the use of more accurate priors derived from your new dataset of 568 indoor and outdoor images. (Gehler et al. 2008)

>> Bayesian Factorial MRF for Joint Albedo & Depth Estimation

Consider jointly estimating scene albedo and depth using a Bayesian probabilistic approach that models the image as a factorial Markov random field, allowing for the incorporation of natural image and depth statistics as priors on the hidden layers. (Nishino, Kratz, and Lombardi 2011)

> Shape Analysis & Modeling for Image Alignment

>> Mesh Registration with 3D Shape & Appearance Data

Consider incorporating both 3D shape and appearance information when developing mesh registration techniques for human meshes, as this combination can lead to improved performance and more reliable alignments. (Bogo et al. 2014)

> Feature Engineering for Object Recognition

>> Combining Depth and Colour Data for Enhanced Object Recognition

Consider using hierarchical matching pursuit (HMP) for unsupervised feature learning in RGB-D data, which enables superior object recognition results through sparse coding of raw data in a hierarchical structure. (“Experimental Robotics” 2013)
Consider combining depth and visual features for improved object recognition performance, especially in category-level tasks, as demonstrated by the superior results achieved by classifiers using both types of features compared to those using only shape or visual features. (Lai et al. 2011)

> Indoor Scene Understanding through Advanced Modeling

>> Advanced Algorithms for Complex Indoor Scene Analysis

Consider adopting a detection-based approach, rather than segmentation, for parsing large-scale point clouds of indoor spaces, as this approach better fits the nature of the problem and leads to improved performance. (Armeni et al. 2016)

> Neural Fields & Reinforcement Learning for Shape Analysis

>> Reinforced Self-Criticism & Baselines for Polygon Prediction

Consider combining reinforcement learning with a self-critical method and a learned baseline to improve the stability and effectiveness of training models for polygon prediction tasks. (Romera-Paredes and Torr 2015)

> Monocular Depth Prediction using Unlabeled Data

>> Leverage Structure-From-Motion or Self-Supervision for Single-View Depth

Leverage large internet photo collections and modern structure-from-motion and multi-view stereo methods to generate training data for single-view depth prediction, while carefully addressing data cleaning and augmentation challenges. (Z. Li and Snavely 2018)

Multi-Object Tracking & Activity Recognition

> Advanced Techniques for Complex Multi-Target Tracking

>> Probabilistic Modeling for Object Identity and Tracking

Bridge the gap between raw sensory observations and higher-level concepts like object identity by using an identity criterion to define a physical event space over which probabilities can be calculated, and by incorporating appearance probabilities as a natural model for understanding how objects change over time. (Cox 1993)
Consider using a Bayesian framework combined with linear programming to solve complex problems involving multiple non-overlapping sources of data, such as multi-camera surveillance, by estimating the maximum a posteriori solution subject to global constraints like mutual exclusivity. (Kettnaker and Zabih, n.d.)

> Improving Multi-Object Tracking via Advanced Techniques

>> Enhancing Object Detection & Tracking with Adaptive Training Strategies

Consider using a trainable, end-to-end approach for object detection tasks, which jointly predicts the objects in an image rather than treating each bounding box as an independent problem. (Jia et al. 2014)

> Evaluation Metrics & Protocols for Single-Object Tracking

>> Evaluation Approaches for Short-Term vs Long-Term Tracking

Evaluate long-term tracking algorithms using a no-reset protocol, which allows for the assessment of a trackers ability to handle target absence and re-detection, rather than solely measuring short-term tracking performance. (Battistone, Petrosino, and Santopietro 2017)
Carefully consider the choice of performance measures when evaluating short-term visual object tracking algorithms, as the variance of certain measures can lead to unreliable results on small datasets, and that the use of a ranking-based methodology that accounts for statistical significance can help mitigate this issue. (NA?)
Consider using the Visual Object Tracking (VOT) framework when evaluating short-term single-object visual trackers, as it provides a comprehensive and rigorous approach to assessing both accuracy and robustness through its novel semi-automatic ground truth bounding box annotation methodology and no-reset experiment extension. (NA?)

> Advanced Techniques for Human Behavior Analysis

>> Bayesian Inference, Attention Mechanisms, Probabilistic Graphical Models

Consider incorporating Bayesian methods, specifically Hidden Markov Models (HMMs) and Coupled Hidden Markov Models (CHMMs), into your studies of human behavior, as these models allow for efficient handling of small datasets and novel behaviors while providing a clear framework for integrating prior knowledge and evidence from data. (Christensen 1999)

> Gait Recognition Systems Improvements via Data Collection Techniques

>> Multi-viewpoint 3D gait analysis for improved recognition

Consider collecting and analyzing 3D gait data from multiple viewpoints to increase the accuracy and robustness of gait recognition systems, as demonstrated by the authors achievement of a 99.6% correct classification rate and a 4.3% equal error rate using the University of Southampton Multi-Biometric Tunnel. (Seely et al. 2008)

> Human Pose Estimation Techniques

>> Improving Articulated Human Pose Estimation with Advanced Modeling Approaches

Consider using a mixture of pictorial structure models (PSMs) instead of a single global model for improved accuracy in human pose estimation tasks, as this approach allows for more faithful modeling of the prior over pose and better handling of the multi-modal appearance of body parts. (NA?)
Consider using flexible mixtures of small, nonoriented parts instead of rigid templates for modeling articulated shapes, as this approach can improve efficiency and accuracy in detecting and estimating human poses in static images. (NA?)

> Action Recognition Optimization Techniques

>> Optimal Dataset Selection & Feature Engineering

Carefully consider the level of abstraction and type of action classes when selecting or creating datasets for human action recognition tasks, as this can significantly impact the performance and generalizability of your models. (Edwards, Deng, and Xie 2015)
Prioritize the use of controlled, multimodal datasets like the Berkeley Multimodal Human Action Database (MHAD) for developing and evaluating algorithms in human motion analysis, as it allows for more reliable and robust results compared to uncontrolled, single-modality datasets. (Ofli et al. 2013)
Use kinematic energy (KE) to select the most representative action poses for codebook construction, rather than randomly selecting vectors, as this leads to better clustering and improved detection and recognition results. (NA?)

>> Optimal Latency Reduction for Accurate Action Recognition

Consider the trade-off between accuracy and observational latency when developing action recognition algorithms, and that the authors demonstrate a logistic regression-based classifier that reduces latency by automatically determining distinctive canonical poses from data and using these to robustly recognize actions in the presence of ambiguous poses. (NA?)

> Human Pose Estimation & Transfer Learning

>> Unified Transformer Model for Multiple Pose Tasks

Consider using a unified modeling approach for multiple pose-based tasks, specifically by treating text-based action labels and coordinate-based human poses as language sequences and optimizing a single auto-regressive transformer, while employing a dynamic routing mechanism to share parameters among tasks to avoid interference. (Bingel and Søgaard 2017)

Video Understanding via Datasets & Model Improvements

> Video Dataset Curation & Enhanced Action Recognition

>> Action Recognition on Realistic Videos using Contextual Information

Use the UCF101 dataset, which contains 101 action classes and over 13k clips of realistic user-uploaded videos with camera motion and cluttered backgrounds, to evaluate your action recognition methods and improve upon the current state-of-the-art baseline result of 44.5%. (Soomro, Zamir, and Shah 2012)

>> Large-Scale, Diverse Video Datasets for Robust Algorithm Development

Strive to create datasets that are well-calibrated, meaning they only favor the ground-truth representation for the task at hand and have no significant biases for other representations. (Kay et al. 2017)
Utilize a comprehensive benchmark dataset like DAVIS, which features high-quality, densely annotated video sequences covering multiple challenges in video object segmentation, along with three complementary evaluation metrics that capture spatial, temporal, and contour accuracy, to reliably assess and improve the performance of your algorithms. (Perazzi et al. 2016)
Prioritize collecting large, diverse, and accurately annotated video data sets to enable robust evaluation and improvement of object detection algorithms. (He et al. 2015)
Prioritize collecting large-scale, realistic, and diverse datasets with detailed annotations for evaluating event recognition algorithms in surveillance videos, as this will enable more accurate and robust model development. (Oh et al. 2011)

>> Video Summarization Techniques Using Category Information & Optimization

Consider using submodular optimization with partition matroid constraints to solve the problem of summarizing egocentric videos, as it allows for efficient and effective summarization that balances relevance, diversity, and compactness while incorporating gaze information to improve personalization and accuracy. (Yeung, Fathi, and Fei-Fei 2014)
Leverage category-specific information to improve the efficiency and accuracy of video summarization algorithms, particularly through the use of temporal segmentation and supervised importance-scoring algorithms. (Arlot, Celisse, and Harchaoui 2012)

> Improving Dense Video Captioning through Hybrid Approaches

>> Hybrid Models for Enhanced Video Description Accuracy

Consider a two-stage adaptation process when transferring image-based vision-language models to video data, starting with fine-tuning the visual encoder on large video datasets with short captions, followed by instruction-tuning the language model on smaller video datasets with detailed captions. (Yue Zhao et al. 2024)
Prioritize creating a comprehensive, video-centric instruction dataset that emphasizes spatiotemporal reasoning and captures causal relationships when building a chat-centric video understanding system. (K. Li et al. 2023)
Consider combining bottom-up and top-down approaches to improve the accuracy of video description tasks, specifically by utilizing low-level joint distributions over video features and language as a set of lingual proposals that are then filtered by mid-level concept detectors. (NA?)

>> Temporal Dynamics vs Large Scale Pretraining

Consider utilizing large-scale pretraining on unannotated data to achieve significant improvements in performance for complex tasks like dense video captioning, even when high-quality annotated data is scarce or costly to obtain. (Xi Chen et al. 2022)
Consider the temporal dynamics and interdependencies among events when developing models for dense-captioning events in videos, as demonstrated by the authors novel model that integrates a multi-scale event proposal module and a captioning module that utilizes context from past and future events. (Venugopalan et al. 2014)

> Video Analysis Enhanced by Temporal Models & Architectures

>> Language-Guided Segmentation & LRCNs for Sequential Data

Consider using a memory-augmented transformer architecture for language-guided video segmentation tasks, as it enables efficient querying of the entire video with the language expression and improves performance compared to previous state-of-the-art methods. (Liang et al. 2023)

Visual Analytics Techniques for High Dimensional Data

> Visualization Methods for Large Datasets and Graphs

>> Hierarchical Clustering & Multidimensional Scaling for Big Data Visualization

Consider utilizing hierarchical clustering algorithms to organize large-scale image datasets for machine learning, allowing for efficient and scalable visual exploration through a modified interactive treemap algorithm. (Bertucci et al. 2022)
Consider using the DenseLines technique for visualizing large numbers of time series data, as it addresses the issues of clutter and scalability associated with traditional line charts by aggregating data into bins and displaying density as a function of time and value, while still allowing for identification of outliers and dense regions. (Moritz and Fisher 2018)
Consider using hierarchical parallel coordinates, a multiresolutional extension of traditional parallel coordinates, to explore large datasets. This approach uses hierarchical clustering to organize data into nested clusters, allowing users to navigate and filter the dataset at various levels of granularity, reducing clutter and improving the identification of patterns and anomalies. (Fua, Ward, and Rundensteiner, n.d.)

>> Topological Attributes for Informative Volume Dataset Visualization

Consider using topological attributes derived from the level-set graph of a given volume dataset to create more informative and interpretable visualizations, particularly when dealing with simulated datasets where global context is essential for understanding local features. (Takeshima et al. 2005)

>> Hierarchical Graph Generation with Open Source Tool Dot

Consider using the open-source tool dot for generating hierarchical graphs, which employs a four-phase algorithm consisting of cycle breaking, rank assignment, node ordering, and coordinate setting to produce visually appealing and informative representations of directed graphs. (NA?)

>> Simulated Annealing Optimized Dataset Generation for Enhanced Data Visualizations

Consider using a simulated annealing optimization strategy to generate datasets with varied graphical appearances but identical statistical properties, which can help illustrate the importance of graphical representations in data exploration. (NA?)

>> Graphical Representations vs Tabular Presentations for Statistical Results

Prioritize the use of graphs over tables for presenting statistical results, especially when the primary goal is to make comparisons, as graphs are more effective for this purpose and can often convey the same information in less space. (NA?)

> Dimension Reduction Methods for Effective Data Visualization

>> Optimal Dimensionality Reduction for Word Embeddings

Optimize the dimensionality of word embeddings by balancing the bias-variance trade-off inherent in the Pairwise Inner Product (PIP) loss, which provides a theoretically sound and computationally efficient way to measure the dissimilarity between word embeddings. (Bahdanau, Cho, and Bengio 2014)

>> Optimizing Data Visualizations through Advanced Dimension Reduction

Prioritize developing visualization techniques that effectively balance global and local structure preservation, denoising, and robustness to achieve optimal results in exploring complex, high-dimensional data. (Moon et al. 2017)
Consider using an approximate tSNE algorithm (A-tSNE) when working with high-dimensional data in progressive visual analytics, as it provides faster initialization times and allows for interactive modifications of the data without disrupting the visual analysis process. (Pezzotti et al. 2015)

> Image Analysis & Graph Theory Methodologies

>> Edge Detection & Shape Recovery via Novel Representations

Consider using wedgelets, a new type of data representation, for analyzing images with edges. Wedgelets are a collection of atoms that can be used to synthesize image data of arbitrary type, and they are particularly well-suited for recovering edges in the horizon model. They achieve nearly the minimax description length for objects in horizon classes, and there is a fast algorithm available for obtaining atomic decompositions of noisy image data into wedgelets. Additionally, w (Foll, Beaumont, and Gaggiotti 2008)

> Topological Data Analysis for Biological Networks

>> Bayesian Inference & Persistence Image Representation

Consider using a Bayesian framework for analyzing persistence diagrams (PDs) derived from topological data analysis (TDA) of biological networks. Specifically, the paper proposes a novel Bayesian framework that treats PDs as a collection of points distributed on a relevant domain space, allowing for simultaneous estimation of the distribution of the number of points and your spatial configuration. This approach enables researchers to better quantify the uncertainty associated with the inferred topological features of the biological (Maroulas, Micucci, and Nasrin 2022)

> Advanced Methodologies for Network Analysis and Modeling

>> Network Clustering & Graph Theory Applications

Pay close attention to the relationship between the difference in probabilities of within-group and between-group connections (a-b)^2 and the sum of those probabilities (a+b) when analyzing cluster structures in sparse networks, as this ratio determines the feasibility of accurately reconstructing the original clusters from observed network data. (Mossel, Neeman, and Sly 2014)

> Efficient Algorithms for Large Datasets and Network Analysis

>> Entity Estimation Algorithms for Complex Datasets

Prioritize the development of unique entity estimation algorithms that are computationally efficient, statistically rigorous, and robust to noise in the data, particularly for large and complex datasets like the Syrian conflict dataset. (B. Chen, Shrivastava, and Steorts 2018)

> Clustering Method Selection & Optimization

>> Cluster Analysis Enhancement via Algorithmic & Distance Measure Choices

Consider using partitional clustering algorithms for large document datasets, as they offer lower computational requirements and often yield superior or comparable clustering performance compared to agglomerative algorithms. (Ying Zhao and Karypis 2002)

>> Balancing Feature Space and Constraint Space in Hierarchical Clustering

Consider incorporating both feature-space and constraint-space dissimilarity matrices in your hierarchical clustering analyses, allowing them to balance the tradeoff between homogeneity in the feature space and adherence to spatial or other constraints through the use of a mixing parameter ?. (Chavent et al. 2018)

> Cluster Analysis Methodologies for Optimal Cluster Recovery

>> Optimizing Stopping Rules & Adjusting Parameters in K-Means

Carefully evaluate and compare various stopping rules for determining the number of clusters in a dataset, as they vary significantly in your ability to accurately recover the true cluster structure, and choose one or more of the better performing ones while being aware of potential data dependency issues. (Milligan and Cooper 1985)

>> LAS Algorithm for Biclustering Real-Valued Data

Consider using the LAS algorithm for biclustering real-valued data, as it is motivated by an additive submatrix model and utilizes a significance-based score function that takes into account the size and average value of the entries of a submatrix, providing a simple and effective means of detecting large average submatrices within a given data matrix. (Shabalin et al. 2009)

> Clustering & Anomaly Detection Methodologies for Complex Datasets

>> DBSCAN Variants for Scalable & Noise-Resilient Clustering

Carefully consider the choice of density-based clustering algorithm, particularly the DBSCAN algorithm and its many variants, taking into account factors such as scalability, ability to handle noise and varying densities, and computational complexity. (Rehman et al. 2014)

> Clustering Methodologies for Enhanced Robustness and Accuracy

>> Advanced Metrics and Divergence Measures for Noise Resilience

Consider using advanced and extended metrics (dAMA and dEM) instead of traditional Euclidean distance when working with noisy data in order to improve the robustness of your clustering results. (NA?)

>> Bayesian Nonparametric Mixture Models for Spatial Income Analysis

Utilize a novel Bayesian nonparametric method combining Markov random field models and mixture of finite mixtures models to analyze spatial income Lorenz curves, which allows for simultaneous estimation of the number of clusters and the clustering configuration while accounting for both locally spatially contiguous clusters and globally discontiguous clusters. (Hu et al. 2023)

> Spectral Clustering Methodologies for Enhanced Dataset Analysis

>> Spectral Clustering Optimization through Graph Selection

Carefully choose the type of similarity graph and graph Laplacian used in spectral clustering, as these choices impact the performance of the clustering algorithm and the interpretation of the results. (Luxburg 2007)

References

Arlot, Sylvain, Alain Celisse, and Zaid Harchaoui. 2012. “A Kernel Multiple Change-Point Algorithm via Model Selection.” arXiv. https://doi.org/10.48550/ARXIV.1202.3878.

Armeni, Iro, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer, and Silvio Savarese. 2016. “3D Semantic Parsing of Large-Scale Indoor Spaces.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June. https://doi.org/10.1109/cvpr.2016.170.

Atia, Naoual, Amir Benzaoui, Sébastien Jacques, Madina Hamiane, Kaouther El Kourd, Ayache Bouakaz, and Abdeldjalil Ouahabi. 2022. “Particle Swarm Optimization and Two-Way Fixed-Effects Analysis of Variance for Efficient Brain Tumor Segmentation.” Cancers 14 (September). https://doi.org/10.3390/cancers14184399.

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2014. “Neural Machine Translation by Jointly Learning to Align and Translate.” arXiv. https://doi.org/10.48550/ARXIV.1409.0473.

Battistone, Francesco, Alfredo Petrosino, and Vincenzo Santopietro. 2017. “Watch Out: Embedded Video Tracking with BST for Unmanned Aerial Vehicles.” Journal of Signal Processing Systems 90 (September). https://doi.org/10.1007/s11265-017-1279-x.

Bertucci, Donald, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng. 2022. “DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps.” arXiv. https://doi.org/10.48550/ARXIV.2205.06935.

Bingel, Joachim, and Anders Søgaard. 2017. “Identifying Beneficial Task Relations for Multi-Task Learning in Deep Neural Networks.” arXiv. https://doi.org/10.48550/ARXIV.1702.08303.

Bogo, Federica, Javier Romero, Matthew Loper, and Michael J. Black. 2014. “FAUST: Dataset and Evaluation for 3D Mesh Registration.” 2014 IEEE Conference on Computer Vision and Pattern Recognition, June. https://doi.org/10.1109/cvpr.2014.491.

Bouveyron, Charles, Etienne Côme, and Julien Jacques. 2015. “The Discriminative Functional Mixture Model for a Comparative Analysis of Bike Sharing Systems.” The Annals of Applied Statistics 9 (December). https://doi.org/10.1214/15-aoas861.

Chavent, Marie, Vanessa Kuentz-Simonet, Amaury Labenne, and Jérôme Saracco. 2018. “ClustGeo: An r Package for Hierarchical Clustering with Spatial Constraints.” Computational Statistics 33 (January). https://doi.org/10.1007/s00180-018-0791-1.

Chen, Beidi, Anshumali Shrivastava, and Rebecca C. Steorts. 2018. “Unique Entity Estimation with Application to the Syrian Conflict.” The Annals of Applied Statistics 12 (June). https://doi.org/10.1214/18-aoas1163.

Chen, Xingyu, Zihan Zhao, Lu Chen, Danyang Zhang, Jiabao Ji, Ao Luo, Yuxuan Xiong, and Kai Yu. 2021. “WebSRC: A Dataset for Web-Based Structural Reading Comprehension.” arXiv. https://doi.org/10.48550/ARXIV.2101.09465.

Chen, Xi, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, et al. 2022. “PaLI: A Jointly-Scaled Multilingual Language-Image Model.” arXiv. https://doi.org/10.48550/ARXIV.2209.06794.

Christensen, Henrik I. 1999. “Computer Vision Systems.” Lecture Notes in Computer Science. https://doi.org/10.1007/3-540-49256-9.

Chuang, Yung-Yu, B. Curless, D. H. Salesin, and R. Szeliski. n.d. “A Bayesian Approach to Digital Matting.” Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. https://doi.org/10.1109/cvpr.2001.990970.

Cox, Ingemar J. 1993. “A Review of Statistical Data Association Techniques for Motion Correspondence.” International Journal of Computer Vision 10 (February). https://doi.org/10.1007/bf01440847.

Edwards, Michael, Jingjing Deng, and Xianghua Xie. 2015. “From Pose to Activity: Surveying Datasets and Introducing CONVERSE.” arXiv. https://doi.org/10.48550/ARXIV.1511.05788.

“Experimental Robotics.” 2013. Springer Tracts in Advanced Robotics. https://doi.org/10.1007/978-3-319-00065-7.

Foll, Matthieu, Mark A Beaumont, and Oscar Gaggiotti. 2008. “An Approximate Bayesian Computation Approach to Overcome Biases That Arise When Using Amplified Fragment Length Polymorphism Markers to Study Population Structure.” Genetics 179 (June). https://doi.org/10.1534/genetics.107.084541.

Fua, Ying-Huey, M. O. Ward, and E. A. Rundensteiner. n.d. “Hierarchical Parallel Coordinates for Exploration of Large Datasets.” Proceedings Visualization ’99 (Cat. No.99CB37067). https://doi.org/10.1109/visual.1999.809866.

Gehler, Peter Vincent, Carsten Rother, Andrew Blake, Tom Minka, and Toby Sharp. 2008. “Bayesian Color Constancy Revisited.” 2008 IEEE Conference on Computer Vision and Pattern Recognition, June. https://doi.org/10.1109/cvpr.2008.4587765.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Deep Residual Learning for Image Recognition.” arXiv. https://doi.org/10.48550/ARXIV.1512.03385.

Hu, Guanyu, Junxian Geng, Yishu Xue, and Huiyan Sang. 2023. “Bayesian Spatial Homogeneity Pursuit of Functional Data: An Application to the u.s. Income Distribution.” Bayesian Analysis 18 (June). https://doi.org/10.1214/22-ba1320.

Jia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. “Caffe: Convolutional Architecture for Fast Feature Embedding.” arXiv. https://doi.org/10.48550/ARXIV.1408.5093.

Kay, Will, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, et al. 2017. “The Kinetics Human Action Video Dataset.” arXiv. https://doi.org/10.48550/ARXIV.1705.06950.

Kettnaker, V., and R. Zabih. n.d. “Bayesian Multi-Camera Surveillance.” Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149). https://doi.org/10.1109/cvpr.1999.784638.

Koplan, Bruce A., and William G. Stevenson. 2009. “Ventricular Tachycardia and Sudden Cardiac Death.” Mayo Clinic Proceedings 84 (March). https://doi.org/10.4065/84.3.289.

Lai, Kevin, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2011. “A Large-Scale Hierarchical Multi-View RGB-d Object Dataset.” 2011 IEEE International Conference on Robotics and Automation, May. https://doi.org/10.1109/icra.2011.5980382.

Li, KunChang, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, and Yu Qiao. 2023. “VideoChat: Chat-Centric Video Understanding.” arXiv. https://doi.org/10.48550/ARXIV.2305.06355.

Li, Minghao, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. 2021. “TrOCR: Transformer-Based Optical Character Recognition with Pre-Trained Models.” arXiv. https://doi.org/10.48550/ARXIV.2109.10282.

Li, Zhengqi, and Noah Snavely. 2018. “MegaDepth: Learning Single-View Depth Prediction from Internet Photos.” arXiv. https://doi.org/10.48550/ARXIV.1804.00607.

Liang, Chen, Wenguan Wang, Tianfei Zhou, Jiaxu Miao, Yawei Luo, and Yi Yang. 2023. “Local-Global Context Aware Transformer for Language-Guided Video Segmentation.” IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (August). https://doi.org/10.1109/tpami.2023.3262578.

Luxburg, Ulrike von. 2007. “A Tutorial on Spectral Clustering.” arXiv. https://doi.org/10.48550/ARXIV.0711.0189.

Maroulas, Vasileios, Cassie Putman Micucci, and Farzana Nasrin. 2022. “Bayesian Topological Learning for Classifying the Structure of Biological Networks.” Bayesian Analysis 17 (September). https://doi.org/10.1214/21-ba1270.

Milligan, Glenn W., and Martha C. Cooper. 1985. “An Examination of Procedures for Determining the Number of Clusters in a Data Set.” Psychometrika 50 (June). https://doi.org/10.1007/bf02294245.

Mishra, Anand, Karteek Alahari, and Cv Jawahar. 2012. “Scene Text Recognition Using Higher Order Language Priors.” Procedings of the British Machine Vision Conference 2012. https://doi.org/10.5244/c.26.127.

Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2017. “Visualizing Structure and Transitions for Biological Data Exploration,” March. https://doi.org/10.1101/120378.

Moritz, Dominik, and Danyel Fisher. 2018. “Visualizing a Million Time Series with the Density Line Chart.” arXiv. https://doi.org/10.48550/ARXIV.1808.06019.

Mossel, Elchanan, Joe Neeman, and Allan Sly. 2014. “Reconstruction and Estimation in the Planted Partition Model.” Probability Theory and Related Fields 162 (July). https://doi.org/10.1007/s00440-014-0576-6.

Negahbani, Farzin, Rasool Sabzi, Bita Pakniyat Jahromi, Dena Firouzabadi, Fateme Movahedi, Mahsa Kohandel Shirazi, Shayan Majidi, and Amirreza Dehghanian. 2021. “PathoNet Introduced as a Deep Neural Network Backend for Evaluation of Ki-67 and Tumor-Infiltrating Lymphocytes in Breast Cancer.” Scientific Reports 11 (April). https://doi.org/10.1038/s41598-021-86912-w.

Nishino, Ko, Louis Kratz, and Stephen Lombardi. 2011. “Bayesian Defogging.” International Journal of Computer Vision 98 (November). https://doi.org/10.1007/s11263-011-0508-1.

Ofli, Ferda, Rizwan Chaudhry, Gregorij Kurillo, Rene Vidal, and Ruzena Bajcsy. 2013. “Berkeley MHAD: A Comprehensive Multimodal Human Action Database.” 2013 IEEE Workshop on Applications of Computer Vision (WACV), January. https://doi.org/10.1109/wacv.2013.6474999.

Oh, Sangmin, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, et al. 2011. “A Large-Scale Benchmark Dataset for Event Recognition in Surveillance Video.” CVPR 2011, June. https://doi.org/10.1109/cvpr.2011.5995586.

Perazzi, F., J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung. 2016. “A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June. https://doi.org/10.1109/cvpr.2016.85.

Pezzotti, Nicola, Boudewijn P. F. Lelieveldt, Laurens van der Maaten, Thomas Höllt, Elmar Eisemann, and Anna Vilanova. 2015. “Approximated and User Steerable tSNE for Progressive Visual Analytics.” arXiv. https://doi.org/10.48550/ARXIV.1512.01655.

“Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications.” 2010. Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-642-16687-7.

Rehman, Saif Ur, Sohail Asghar, Simon Fong, and S. Sarasvady. 2014. “DBSCAN: Past, Present and Future.” The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014), February. https://doi.org/10.1109/icadiwt.2014.6814687.

Romera-Paredes, Bernardino, and Philip H. S. Torr. 2015. “Recurrent Instance Segmentation.” arXiv. https://doi.org/10.48550/ARXIV.1511.08250.

Seely, Richard D., Sina Samangooei, Middleton Lee, John N. Carter, and Mark S. Nixon. 2008. “The University of Southampton Multi-Biometric Tunnel and Introducing a Novel 3D Gait Dataset.” 2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems. https://doi.org/10.1109/btas.2008.4699353.

Shabalin, Andrey A., Victor J. Weigman, Charles M. Perou, and Andrew B. Nobel. 2009. “Finding Large Average Submatrices in High Dimensional Data.” The Annals of Applied Statistics 3 (September). https://doi.org/10.1214/09-aoas239.

Soomro, Khurram, Amir Roshan Zamir, and Mubarak Shah. 2012. “UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild.” arXiv. https://doi.org/10.48550/ARXIV.1212.0402.

Takeshima, Y., S. Takahashi, I. Fujishiro, and G. M. Nielson. 2005. “Introducing Topological Attributes for Objective-Based Visualization of Simulated Datasets.” Fourth International Workshop on Volume Graphics, 2005. https://doi.org/10.1109/vg.2005.194108.

Venugopalan, Subhashini, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, and Kate Saenko. 2014. “Translating Videos to Natural Language Using Deep Recurrent Neural Networks.” arXiv. https://doi.org/10.48550/ARXIV.1412.4729.

Yeung, Serena, Alireza Fathi, and Li Fei-Fei. 2014. “VideoSET: Video Summary Evaluation Through Text.” arXiv. https://doi.org/10.48550/ARXIV.1406.5824.

Zeiler, Matthew D. 2012. “ADADELTA: An Adaptive Learning Rate Method.” arXiv. https://doi.org/10.48550/ARXIV.1212.5701.

Zhang, Kaidong, and Dong Liu. 2023. “Customized Segment Anything Model for Medical Image Segmentation.” arXiv. https://doi.org/10.48550/ARXIV.2304.13785.

Zhao, Ying, and George Karypis. 2002. “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” Proceedings of the Eleventh International Conference on Information and Knowledge Management, November. https://doi.org/10.1145/584792.584877.

Zhao, Yue, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, et al. 2024. “Distilling Vision-Language Models on Millions of Videos.” arXiv. https://doi.org/10.48550/ARXIV.2401.06129.